{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Modin (Pandas on Ray)\n",
    "\n",
    "**GOAL:** Learn to increase the speed of Pandas workflows by changing a single line of code.\n",
    "\n",
    "[Modin](https://modin.readthedocs.io/en/latest/?badge=latest) (Pandas on Ray) is a project aimed at speeding up Pandas using Ray.\n",
    "\n",
    "### Using Modin\n",
    "\n",
    "To use Modin, only a single line of code must be changed.\n",
    "\n",
    "Simply change:\n",
    "```python\n",
    "import pandas as pd\n",
    "```\n",
    "to\n",
    "```python\n",
    "import modin.pandas as pd\n",
    "```\n",
    "\n",
    "Changing this line of code will allow you to use all of the cores in your machine to do computation on your data. One of the major performance bottlenecks of Pandas is that it only uses a single core for any given computation. **Modin** exposes an API that is identical to Pandas, allowing you to continue interacting with your data as you would with Pandas. **There are no additional commands required to use Modin locally.** Partitioning, scheduling, data transfer, and other related concerns are all handled by **Modin** and **Ray** under the hood.\n",
    "\n",
    "### Concept for Exercise: DataFrame Constructor\n",
    "\n",
    "Often when playing around in Pandas, it is useful to create a DataFrame with the constructor. That is where we will start.\n",
    "\n",
    "```python\n",
    "import numpy as np\n",
    "import pandas as pd\n",
    "\n",
    "frame_data = np.random.randint(0, 100, size=(2**10, 2**5))\n",
    "df = pd.DataFrame(frame_data)\n",
    "```\n",
    "\n",
    "The above code creates a Pandas DataFrame full of random integers with 1024 rows and 128 columns."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "from __future__ import absolute_import\n",
    "from __future__ import division\n",
    "from __future__ import print_function\n",
    "\n",
    "import numpy as np\n",
    "import pandas\n",
    "import subprocess\n",
    "import sys"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "**EXERCISE:** Modify the code below to make the dataframe a `modin.pandas` DataFrame (remember the line of code to change)."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Implement your answer here. You are also free to play with the size\n",
    "# and shape of the DataFrame, but beware of exceeding your memory!\n",
    "\n",
    "import pandas as pd\n",
    "\n",
    "frame_data = np.random.randint(0, 100, size=(2**10, 2**5))\n",
    "df = pd.DataFrame(frame_data)\n",
    "\n",
    "# ***** Do not change the code below! It verifies that \n",
    "# ***** the exercise has been done correctly. *****\n",
    "\n",
    "try:\n",
    "    assert df is not None\n",
    "    assert frame_data is not None\n",
    "    assert isinstance(frame_data, np.ndarray)\n",
    "except:\n",
    "    raise AssertionError('Don\\'t change too much of the original code!')\n",
    "assert 'modin.pandas' in sys.modules, 'Not quite correct. Remember the single line of code change (See above)'\n",
    "assert hasattr(df, '_query_compiler'), 'Make sure that df is a modin.pandas DataFrame.'\n",
    "\n",
    "print(\"Success! You only need to change one line of code!\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Now that we have created a toy example for playing around with the DataFrame, let's print it out in different ways.\n",
    "\n",
    "### Concept for Exercise: Data Interaction and Printing\n",
    "\n",
    "When interacting with data, it is very imporant to look at different parts of the data (e.g. `df.head()`). Here we will show that you can print the `modin.pandas` DataFrame in the same ways you would Pandas."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Print the first 10 lines.\n",
    "df.head(10)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Print the DataFrame.\n",
    "df"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Free cell for custom interaction (Play around here!)\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "`modin.pandas` is using all of the cores in your machine to read the CSV file much faster than Pandas!"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Concept for Exercise: Identical API\n",
    "\n",
    "As previously mentioned, `modin.pandas` has an identical API to Pandas. In this section, we will go over some examples of how you can use `modin.pandas` to interact with your data in the same way you would with Pandas.\n",
    "\n",
    "**Note: `modin.pandas` does not yet have 100% of the Pandas API fully implemented or optimized. Some parameters are not implemented for some methods and some of the more obscure methods are not yet implemented. We are continuing to work toward 100% API coverage.**\n",
    "\n",
    "For a full list of implemented methods, visit the [Modin documentation](https://modin.readthedocs.io/en/latest/pandas_supported.html)."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "df.describe()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Transpose the data\n",
    "df.T"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Create a new column the same as Pandas\n",
    "df['New Column'] = np.nan"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "df"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "df.columns"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Delete the first column\n",
    "del df[df.columns[0]]"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "df.columns"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Some operation are not yet optimized, but they are implemented.\n",
    "df.fillna(value=0, axis=0, limit=100)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Some operations are not yet implemented, they will throw this error.\n",
    "df.kurtosis()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Free cell for custom interaction (Play around here!).\n"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.6.4"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 2
}
